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Abstract. Self-organization provides a framework for the study of sys¬ 
tems in which complex patterns emerge from simple rules, without the 
guidance of external agents or hne tuning of parameters. Within this 
framework, one can formulate a guiding principle for plasticity in the con¬ 
text of unsupervised learning, in terms of an objective function. In this 
work we derive Hebbian, self-limiting synaptic plasticity rules from such an 
objective function and then apply the rules to the non-linear bars problem. 


1 Introduction 

Hebbian learning rules [T] are at the basis of unsupervised learning in neural net¬ 
works, involving the adaption of the inter-neural synaptic weights [21 [3]. These 
rules usually make use of either an additional renormalization step or a decay 
term in order to avoid runaway synaptic growth Hi]. 

From the perspective of self-organization 013012 ], it is interesting to study 
how Hebbian, self-limiting synaptic plasticity rules can emerge from a set of gov¬ 
erning principles, in terms of objective functions. Information theoretical mea¬ 
sures such as the entropy of the output firing rate distribution have been used in 
the past to generate rules for either intrinsic or synaptic plasticity HQllIIllIl]. 
The objective function with which we work here can be motivated from the 
Fisher information, which measures the sensitivity of a certain probability dis¬ 
tribution to a parameter, in this case defined with respect to the Synaptic Flux 
operator [13], which measures the overall increase of synaptic weights. Minimiz¬ 
ing the Fisher information corresponds, in this context, to looking for a steady 
state solution where the output probability distribution is insensitive to local 
changes in the synaptic weights. This method, then constitutes an implementa¬ 
tion of the stationarity principle, stating that once the features of a stationary 
input distribution have been acquired, learning should stop, avoiding runaway 
growth of the synaptic weights. 

It is important to note that, while in other contexts the Fisher information is 
maximized to estimate a certain parameter via the Cramer-Rao bound, in this 
case the Fisher information is defined with respect to the model’s parameters, 
which do not need to be estimated, but rather adjusted to achieve a certain 
goal. This procedure has been successfully employed in the past in other fields 
to derive, for instance, the Schrodinger Equation in Quantum Mechanics M- 


2 Methods 


We consider rate-encoding point neurons, where the output activity y of each 
neuron is a sigmoidal function of its weighed inputs, as defined by: 

y = gix), x = '^ Wj{yj - %). (1) 
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Here the yjs are the inputs to the neuron (which will be either the outputs 
of other neurons or external stimuli), the Wj are the synaptic weights, and x the 
integrated input, which one may consider as the neuron’s membrane potential. 
yj represents the average of input yj, so that only deviations from the average 
convey information, g represents here a sigmoidal transfer function, such that 
g{x) —> 1/0 when x —^ zboc. The output firing rate y of the neuron is hence 
a sigmoidal function of the membrane potential x. 


By minimization through stochastic gradient descent of: 


Fob = E[fob{x)\ 
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a Hebbian self-limiting learning rule for the synaptic weights can be obtained 
m- Here E[.] denotes the expected value, as averaged over the probability 
distribution of the inputs, and y' and y" are respectively the first and second 
derivatives of y{x). A/" is a parameter of the model (originally derived as 
and then generalized m), which sets the values for the system’s fixed-points, 
as shown in Section nn 

In the case of an exponential, or Fermi transfer function, we obtain 


/.* = (« + x(1-2!,W))" (3) 

for the kernel fob of the objective function Fob- The intrinsic parameter b rep¬ 
resents a bias and sets the average activity level of the neuron. This parameter 
can either be kept constant, or adapted with little interference by other standard 
procedures such as maximizing the output entropy [iniiis]. 


In Fig. □(a) the functional dependence of fob is shown. It diverges for x — ^ 
Too and minimizing fob will hence keep x, and therefore the synaptic weights, 
bound to finite values. Minimizing (|3]) through stochastic gradient descent with 
respect to Wj^ one obtains m- 

Wj = €y,G{x)H{x){yj-yj) (4) 

G(x) = N+ x(l-2y), H(x) = {2y-l) + 2x{l-y)y (5) 

where the product H{x){yj — yj) represents the Hebbian part of the update rule, 
with H being an increasing function of x or y^ and where G reverses the sign 
when the activity is too large to avoid runaway synaptic growth. 
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Fig. 1: (a) fob{x), as defined by Eq. (|3]), for 6 = 0 and N = 2. The synaptic 
weights are adapted through (|4]) such that the membrane potential x tends to 
cluster around the two minima, (b) A{x), as defined by Eqs. (j2j) and ([7)), for 
both the exponential and the tangential sigmoidal transfer functions and 6 = 0. 
Adapting the respective values of N identical roots can be obtained, as illustrated 
graphically. 


2.1 Minima of the objective function 

While (j2j) depends quantitatively on the specific choice of the transfer function 
we will show here how the resulting expression for different transfer functions 
are in the end similar. We compare here as an example two choices for g, the 
exponential sigmoidal (or Eermi function) defined in ([3]), and a arc-tangential 
transfer function defined as: 

gtan{x) = -arctan{x - 6) + 1/2. (6) 

TT 

These two choices of g^ in turn, define two versions of A(x), 

Aexp{x) = x{l-2y{x)) Atanix) = - 

The objective functions are strictly positive fob > 0, compare ©, and their roots 

Aexpjtan{x^ N (8) 

correspond hence to global minima, which are illustrated in Eig. □(b), where 
Aexp{x) and Atan{x) are plotted for 6 = 0. The minima of fob can then be easily 
found by the intersection of the plot of A{x) with the horizontals at —N. Eor 
Aexp{x) one finds global minima for all values of whereas N needs to be within 
[0,2] for the case of Atan(x). N is however just a parameter of the model and 
the roots of the function which correspond to the neuron’s membrane potential 
are in the same range, with each root representing a low- and high activity states. 

While both rules display a similar behavior, they are not identical, fob di¬ 
verges for X —Too keeping the weights bound, regardless of the dispersion in 
the input distribution. The maxima for x —Too in the tangential function are 




















Fig. 2: Evolution of the synaptic 
weights for both transfer func¬ 
tions m and ([6]). The contin¬ 
uous line represents iCi, corre¬ 
sponding to the principal com¬ 
ponent. A representative sub¬ 
set of the Nw — 1 = 99 other 
weights is presented as dotted 
lines. Top: exponential trans¬ 
fer function. Bottom: tangen¬ 
tial transfer function. 




of finite height, and this height decreases with N, making it unstable to noisy 
input distributions for larger values of N. 


2.2 Applications: PCA and the non-linear bars problem 

In [13] , the authors showed how a neuron operating under these rules is able to 
find the first principal component (PC) of an ellipsoidal input distribution. Here 
we present the neuron with Gaussian activity distributions p{yj) (the distribu¬ 
tions are truncated so that yj G [0,1]). A single component, in this case ^i, has 
standard deviation a and all other — 1 directions have a smaller standard 
deviation of cr/2 (the rules are, however, completely rotation invariant). As an 
example, we have taken N^j = 100, and show how with both transfer functions, 
the neuron is able to find the PC. 

In Fig. [21 the evolution of the synaptic weights is presented as a function 
of time. In this case b has been kept constant at 6 = 0. Learning stops when 
< re >= 0, but since the learning rule is a non-linear function of x, the exact 
final value of w will vary for different transfer functions. In the case of a bimodal 
input distribution, as the one used in the linear discrimination task, both clouds 
of points can be sent close to the minima and the final values of w are then very 
similar, regardless of the choice of transfer function (not shown here). 

Finally, we apply the rules to the non-linear bars problem. Here we follow 
the procedure of m, where, in a grid of L x L inputs, each pixel can take two 
values, one for low intensity and one of high intensity. Each bar consists of a 
complete row or a complete column of high intensity pixels, and each possible 
bar is drawn independently with a probability p = IjL. At the intersection of 
a horizontal and vertical bar, the intensity is the same as if only one bar were 
present, which makes the problem non-linear. The neuron is then presented, at 
each training step, a new input drawn under the prescribed rules and after each 












Fig. 3: (a) Some random training examples for the non-linear bars problem, 
(b) Graphical representation of the typical weight vectors learnt by the neuron 
in distinct training runs. 


step the evolution of the synaptic weights is updated. The bias b in the model 
can either be adjusted as in na by, 6 oc (^ — p), or by maximal entropy intrinsic 
adaption, as described in [13], without mayor differences. 

Since the selectivity to a given pattern is given by the value of the scalar 
product yinputs • one can either compute the output activity y to see to which 
pattern the neuron is selective in the end, or just do an intensity plot of the 
weights, since the maximal selectivity corresponds to te oc yinputs- In Fig. [3] 
a typical set of inputs is presented, together with a typical set of learnt neural 
weights for different realizations in a single neuron training. We see how a neuron 
is able to become selective to individual bars or to single points (the independent 
components in this problem). To check that the neuron can learn single bars, 
even when such a bar is never presented to the neuron in isolation as a stimulus, 
we also trained the neuron with a random pair of bars, one horizontal and one 
vertical, obtaining similar results. The neuron can learn to fire in response to a 
single bar, even when that bar was never presented in isolation. 


3 Discussion and Concluding Remarks 

The implementation of the stationarity principle in terms of the Fisher informa¬ 
tion, presented in m and here discussed, results in a set of Hebbian self-limiting 
rules for synaptic plasticity. The sensitivity of the rule to higher moments of 
the input probability distribution, makes it suitable for applications in indepen¬ 
dent component analysis. Furthermore, the learning rule derived is robust with 
respect to the choice of transfer function g{x), a requirement for biological plau¬ 
sibility. 

In upcoming work, we study the dependence of the steady state solutions of 
the neuron and their stability with respect to the moments of the input distri¬ 
bution. The numerical finding of independent component analysis in the bars 




























problem is then justified. We will also study how a network of neurons can be 
trained using the same rules for all weights, feed-forward and lateral, and how 
clusters of input selectivity to different bars emerge in a self organized way. 
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